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Tensor Numerical Methods in Quantum Chemistry: 
from Hartree-Fock Energy to Excited States 

V. KHOROMSKAIA* ** B. N. KHOROMSKIJ” 


Abstract 

We resume the recent successes of the grid-based tensor numerical methods and discuss 
their prospects in real-space electronic structure calculations. These methods, based on the 
low-rank representation of the multidimensional functions and integral operators, first appeared 
as an accurate tensor calculus for the 3D Hartree potential using ID complexity operations, 
and have evolved to entirely grid-based tensor-structured 3D Hartree-Fock eigenvalue solver. 

It benefits from tensor calculation of the core Hamiltonian and two-electron integrals (TEI) 
in O(nlogn) complexity using the rank-structured approximation of basis functions, electron 
densities and convolution integral operators all represented on 3D nxnxn Cartesian grids. The 
algorithm for calculating TEI tensor in a form of the Cholesky decomposition is based on multiple 
factorizations using algebraic ID “density fitting" scheme , which yield an almost irreducible 
number of product basis functions involved in the 3D convolution integrals, depending on a 
threshold e > 0. The basis functions are not restricted to separable Gaussians, since the 
analytical integration is substituted by high-precision tensor-structured numerical quadratures. 

The tensor approaches to post-Hartree-Fock calculations for the MP2 energy correction and for 
the Bethe-Salpeter excited states, based on using low-rank factorizations and the reduced basis 
method, were recently introduced. Another direction is related to the recent attempts to develop 
a tensor-based Hartree-Fock numerical scheme for finite lattice-structured systems, where one 
of the numerical challenges is the summation of electrostatic potentials of a large number of 
nuclei. The 3D grid-based tensor method for calculation of a potential sum on a L x L x L 
lattice manifests the linear in L computational work, 0{L), instead of the usual 0{L^ logL) 
scaling by the Ewald-type approaches. The accuracy of the order of atomic radii, h ^ 10“'^A, 
for the grid representation of electrostatic potentials is achieved due to low cost of using ID 
operations on large 3D grids. 

Key words: Electronic structure calculations, two-electron integrals, multidimensional integrals, 
tensor decompositions, quantized tensor approximation, low-rank Cholesky factorization, reduced 
higher order SVD, Hartree-Fock equation, excited states, lattice potential sums. 
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1 Introduction 

The problems of numerical modeling the many-particle interactions in large molecular systems, 
lattice structured metallic clusters and crystals, proteins and nanomaterials are the most challeng¬ 
ing tasks in modern computational physics and chemistry. The traditional approaches for these 
multidimensional problems are restricted to the concepts which have well recognized limitations 
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for computations with higher accuracy and for larger non-periodic molecular systems, as well as 
for efficient calculation of excited states. Recent advances in numerical analysis for multidimen¬ 
sional problems and significant achievements in high performance computing suggest new creative 
approaches to these problems. 

The Hartree-Fock (HF) equation governed by the 3D integral-differential operator |66l [28] is 
one of the basic models for ab initio calculation of the ground state energy of molecular systems. 
It is a strongly nonlinear eigenvalue problem (EVP) in a sense, that one should solve the equation 
in a self-consistent way when the integral part of the governing operator depends on the solution 
itself. Multiple strong singularities in the electron density of a molecule due to nuclear cusps impose 
strong requirements on the accuracy of calculations. 

Commonly used numerical methods for solution of the Hartree-Fock equation are based on the 
analytical computation of the arising two-electron integrals (convolution type integrals in in 
the problem adapted naturally separable Gaussian-type bases [1], by using er/-function expansions. 
This rigorous approach resulted in a number of efficient implementations which are widely used in 
computational quantum chemistry. The success of the analytical integration methods stems from 
the big amount of precomputed information based on the physical insight including the construction 
of problem adapted atomic orbitals basis sets and elaborate nonlinear optimization for calculation 
of density fitting basis. The known limitations of this approach appear due to a strong dependence 
of the numerical efficiency on the size and quality of the chosen Gaussian basis sets, that might be 
crucial for larger molecular clusters and heavier atoms. 

The intention to replace or assist the analytical calculations for the Hartree-Fock problem 
by a data-sparse grid-based numerical schemes has a long history. First success was the grid- 
based numerical method for the diatomic molecules in [^, though this approach was not feasible 
to compact (3D) molecules. The wavelet multiresolution schemes |12] capable for the accurate 
representation of nuclear cusps, have been applied to electronic structure simulations since the 
seminal papers [271162], and recently this approach was further advanced due to achievements in 
the high performance supercomputing (62] jS] [19] [T7]. However, due to extensive computational 
resources, the entirely wavelet-based or sparse-grid approaches |23l[73] are limited so far only to 
rather small atomic systems with few electrons |2]. Tensor hyper-contraction decompositions in 
density fitting schemes have been analyzed in [311 EH]. 

The newly developed tensor-structured numerical methods, both the name and the concept, 
appeared during the work on the grid-based tensor approach to the solution of the 3D Hartree- 
Fock problem [371 EH] E]. The central point is the representation of d-variate functions and 
integral/differential operators on large grids and their approximation in the low-rank tensor 
formats, which allows numerical calculations in 0{dn) complexity instead of 0{n'^) by conventional 
methods. 

In this paper we summarize the main benefits of the tensor numerical methods in electronic 
structure calculations, and discuss further prospects of this approach in several directions, such as 

• algebraic directional density fitting and tensor factorization of the two-electron integrals and 
their use in the Hartree-Fock calculations; 

• tensor decompositions in the MP2 energy correction, and in calculation of excited states based 
on the Bethe-Salpeter equation; 

• fast tensor summation of electrostatic potentials on large 3D lattice for efficient calculation 
of the Fock matrix in the case of lattice-structured systems; 

• fast calculation of the interaction energy of the long-range potentials on large 3D lattices. 
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Notice that basic rank-structured tensor formats such as the canonical (PARAFAC) and Tucker 
tensor decompositions have been since long used in the computer science for the quantitative 
analysis of correlations in the experimental multidimensional data arrays in data processing and 
chemometrics, see |49] and references therein. In 2006 the exceptional properties of the Tucker 
decomposition for the discretized multidimensional functions have been revealed in ISaET], where 
it was proven that for a class of function-related tensors the approximation error of the Tucker 
decomposition decays exponentially with respect to the Tucker rank. This gives the opportunity 
to represent the discretized multidimensional functions and integral (convolution) operators in an 
algebraically separable form and thus reduce the numerical treatment of the multidimensional 
transforms to ID operations. It was shown that the number of vectors in such a representation 
depends only logarithmically on the size of the d-dimensional grid, 0(n^), used for discretization 
of multivariate functions. 

The above results have led to the idea to calculate the Hartree potential and the Coulomb and 
exchange operators by numerical quadratures [38] using Gaussian-type basis functions discretized 
on 3D Cartesian grids and the fast tensor-product convolution [33|. In this way, the efficient low- 
rank canonical tensor representation to the Newton kernel was an essential contribution [Tj. To 
reduce the initial rank of the electron density, that is quadratically proportional to the number of 
CTO basis functions, the canonical-to-Tucker tensor transform was invented [381137j . which made 
computations for large tensor grids and extended molecules tractable (even in Matlab). 

The initial version of tensor-structured algorithms for solving Hartree-Fock equation employed 
the 3D grid-based calculation of the Coulomb and exchange integral operators ” on-the-fly”, thus 
avoiding precomputation and storage of the TEI tensor [321 El- particular, it was justihed that 
tensor calculus allows to reduce the 3D convolution integrals to combinations of ID convolutions, 
and ID Hadamard and scalar products. Besides, these results promoted spreading of the tensor- 
structured methods in the community of numerical analysis [sniMiiss], and further development 
of the tree-tensor formats like tensor-train [56] and hierarchical Tucker representations [25]. 

Further development of tensor methods in electronic structure calculations was due to the 
fast algorithm for the grid-based computation of the TEI tensor [33] in 0{N^) storage in the 
number of basis functions Ni,. The fourth order TEI tensor is calculated in a form of the Cholesky 
factorization by using the algebraic ID ’’density fitting" scheme, which applies to the product basis 
functions. Imposing the low-rank tensor representation of the product basis functions and the 
Newton convolving kernel all discretized on n x n x n Cartesian grid, the 3D integral transforms 
are calculated in 0(n log n) complexity. Given the factorized TEI, the update of the Column and 
exchange parts in the Fock matrix reduces to the cheap algebraic operations. Other steps are tensor 
calculation of the core Hamiltonian and the efficient MP2 energy correction scheme [44| . which all 
together gave rise to the black-box Hartree-Fock solver [35]. 

Due to the grid representation of basis functions basis sets are now not restricted to Gaussian- 
type orbitals and allowed to be any well-separable function defined on a grid. The tensor-based 
Hartree-Fock solver is competitive in computational time and accuracy with the standard packages 
based on analytical calculation of integrals. High accuracy is attained owing to easy calculations 
on large 3D grids up to = 10^®, so that the resolution with mesh size h of the order of atomic 
radii, is possible. 

Motivated by tensor decompositions in the MP2 scheme, the new approach for calculation of 
the excited states in the framework of the Bethe-Salpeter equation was recently introduced [B], that 
employs the reduced basis method in combination with low-rank tensor approximations. 

Further developments of the tensor methods in multi-particle simulations are focused on the 
large lattice structures in a box and nearly periodic systems, for which the tensor approach may 
reduce computational costs dramatically m- One of the challenging problems in the numerical 
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treatment of the crystalline-type molecular clusters is summation of a large number of electrostatic 
potentials distributed on a finite (non-periodic) lattice. The novel grid-based method for summation 
of the long-range potentials in the canonical and Tucker formats |46ll48j works on Lx Lx L lattices 
with the computational cost 0{L) instead of 0{L^ log L) by the traditional Ewald-type methods 
|18j . The required precision is guaranteed by employing large 3D Cartesian grids for representation 
of potentials. The method remains efficient for lattices with non-rectangular geometries and in the 
presence of multiple defects @81117!. 

The rest of the paper is organized as follows. In Section [2l we overview the basic tensor formats 
and show why the orthogonal Tucker tensor decomposition, originating from computer science and 
data processing, became useful for the treatment of the multidimensional functions and operators 
in numerical analysis. In particular, 112.31 and ^2.41 discuss the matrix product states (tensor train) 
formats and the novel quantics tensor approximation method, respectively, while 112.51 addresses the 
basic tensor-structured multilinear algebra operations. Section [3] describes tensor calculus of the 
multidimensional convolution transform on examples of the Hartree potential f ll3.ip and the TEI 
tensor 1 ^3.2p . and recalls the main building blocks in the tensor-based black-box Hartree-Eock solver. 
Section m shows benehts of the tensor approach in MP2 calculations and in calculation of the lowest 
part of the excitation energies in the framework of the Bethe-Salpeter equation. Section [5] describes 
the benehts of tensor methods in applications to lattice-type molecular systems. In particular, we 
present fast tensor method for the summation of the long-range interaction potentials on a 3D lattice 
by using the assembled vectors of their canonical and Tucker tensor representations. The important 
application of this approach to calculation of interaction energy of the Coulombic potentials on a 
lattice with sub-linear cost in the lattice size is described in detail. Appendix discusses some 
computational details on the Canonical-to-Tucker tensor transform, the Galerkin discretization 
scheme for the nonlinear Hartree-Eock equation, and the basics of the low-rank canonical tensor 
representation for the Newton kernel. 

2 Rank-structured tensor representations of discretized functions 
and operators 

In this section we discuss shortly why the rank-structured tensors, which were traditionally used in 
experimental data processing, appears to be useful for the separable representation of multivariate 
functions and operators represented on tensor product grids. The canonical and Tucker tensor 
decompositions have been since long used in the computer science community for the quantitative 
analysis of correlations in the experimental multidimensional data arrays in chemometrics and data 
processing, with rather weak requirements on the accuracy, see |631149] and references therein. In 
the recent decade the class of matrix product states type tensor formats became popular in the 
simulations of quantum spin systems and quantum molecular dynamics. We refer to @3 ED [36] 
for recent literature surveys on commonly used tensor formats. 

2.1 Canonical and Tucker tensor formats 

We consider a tensor of order d, as a real multidimensional array A = G '^nix...xnd 

numbered by a d-tuple index se10, with multi-index notation i = G n^},^ = 

l,...,d. It is an element of the linear vector space equipped with the Euclidean scalar 

^The alternative notation A = [A(ii, ...,id)] can be utilized. 
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product, 


ni rid 

(A, B) = 

i \—1 'i'd — 1 

Euclidean vectors and matrices are the special case of dth order tensors. For a general tensor, 
the required storage scales exponentially in the dimension, nin 2 • • • (the so-called ’’curse of 
dimensionality"). To get rid of exponential scaling in the dimension, one can apply the rank- 
structured separable representations of multidimensional tensors. The simplest separable element 
is given by rank-1 canonical tensor, 

U = (g) ... (g) G l^nix...xnd^ 


with entries 


u. 




= u 


( 1 ) 


(d) 


requiring only n\ + ... -|- rid numbers to store it. 

A tensor in the i?-term canonical format is defined by 


U = 




R 




( 1 ) 


u 


(d) 


Cfc € M, 


( 2 . 1 ) 


where ’ G are normalized vectors, and R is called the canonical rank of a tensor. It 

is convenient to introduce the so-called side matrices = [u^^^...u^^] G I = 1,2,3, 

it) 

obtained by concatenation of the canonical vectors u), , k = l,...i?. Now the storage cost is 
bounded by dRn. For d > 3 computation of the canonical rank of a tensor U, i.e. the minimal 
number R in representation (12.11) and the respective decomposition, is an N-P hard problem. In 
the case d = 2 the representation dZH) is merely a rank-i? matrix. 

We say that a tensor V is represented in the rank-r orthogonal Tucker format with the rank 
parameter r = (ri, ...,rd), if 


ri Td 

V= ^ ... ^^^^,.,.,^^vW(g)...(g)v^J, £ = l,...,d, (2.2) 

‘"1=1 ‘"<i=i 

where G R"'^} represents the set of orthonormal vectors, and (d = G is the 

Tucker core tensor. The storage cost for the Tucker tensor is bounded by drn+r'^, r = maxr^. In the 
case d = 2, the orthogonal Tucker decomposition is equivalent to the singular value decomposition 
(SVD) of a rectangular matrix. Figure 12.11 visualizes the canonical and Tucker tensors in the case 
d = 3. 



Figure 2.1: Visualizing the canonical and Tucker tensors for d = 3. 
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Canonical tensor decomposition leads to an ill-posed problem and, up to our best knowledge, 
there are no robust algebraic algorithms for the canonical approximation of an arbitrary tensor. For 
some classes of analytic functions, the explicit low-rank approximation can be constructed in ana¬ 
lytic form by using smc-quadrature approximation to the Laplace transform. The robust methods 
for Tucker decomposition are based on the orthogonal projections using the higher order singular 
value decomposition [15] (HOSVD) that is the generalization of the singular value decomposition 
(SVD) of matrices. 

The rank-structured decomposition (approximation) of multidimensional tensors provides 
means for the separation of variables in the discretized representation of multivariate functions 
and operators, and thus the possibility to substitute multidimensional algebraic transforms by 
univariate operations. Notice that in the computer science community these possibilities were 
restricted to moderate-dimensional tensors of small mode size (with rather large rank parameters) 
obtained from the experimental data sets. 

2.2 Tucker decomposition for function related tensors. Canonical-to-Tucker 
approximation 

In 2006 it was first verified numerically, see in [37], that the rank of the (fixed accuracy) Tucker 
approximation to some function related tensors depends only logarithmically on the size of the 
discretization grid. For a given continuous function / : —)• M, := c m 3, we 

introduced the function related 3rd order tensor, obtained by Galerkin discretization in a volume 
box using n x n x n 3D Cartesian grid. In particular, the low-rank tensor approximations were 
calculated for functions of the Slater type f{x) = exp(—a||x||), Newton kernel f{x) = n^, Yukawa 

potential f{x) = ^ y" , and the Helmholtz potential f{x) = x £ M^. 

Numerical tests demonstrated that the error of the rank-r (with r = (r, r, r)) Tucker approxi¬ 
mation applied to these third order tensors decays exponentially with respect to the Tucker rank r. 
Moreover, for fixed approximation error, the Tucker rank r scales as O(logn) for above mentioned 
functions |34j . 


Slater 


1 
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Figure 2.2: The Tucker approximation error in the Frobenius norm vs. the Tucker rank r for a 
single Slater function. 

This application revealed a number of properties of the Tucker format which were not known 
before. In particular, it was showed that for tensors resulting from the discretization of physically 
relevant multidimensional functions on the tensor grids one can hnd algebraically a reduced sub¬ 
space in each mode of the original tensor, thus approximating the function by separated variables 
using a tensor product of a relatively small number of vectors. 

Figures [22] and [2]3] (right) show the exponential convergence of the Tucker tensor approximation 
in the Tucker rank r of the Slater function in the relative Frobenius norm, Ep]\f = Figures 
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Figure 2.3: The Tucker approximation error in the Frobenius norm vs. the Tucker rank r for a sum 
of Slater potentials over the cubic 8x8x8 lattice (right). 

demonstrate that the exponential decay of approximation error is nearly the same for both a 
single potential and for a sum of the same potentials distributed in nodes of a cubic lattice. This 
property of the Tucker decomposition will be further gainfully applied to the fast lattice summation 
of interaction potentials. 

Motivated by these observations, we invented the canonical-to-Tucker (C2T) decomposition 
for function related tensors m, based on the reduced higher order singular value decomposition 
(RHOSVD) (Theorem 2.5, [38])- The canonical-to-Tucker algorithm (see Appendix) combined with 
the Tucker-to-canonical transform serves for reducing the ranks of canonical tensors with large R. 
Figure [23] in its ALS part shows the algorithmic step for ^ = 1, which is repeated for every mode 



Figure 2.4: Part of the canonical-to-Tucker decomposition algorithm for mode ^ = 1. 

^ = 1,2,3 (see Appendix). The computational work for the multigrid tensor decomposition C2T 
algorithm introduced in [38] exhibits linear complexity scaling with respect to all input parameters, 
0{Rnr). 

The multigrid Tucker decomposition algorithm applied to full format tensors [38| leads to the 
complexity scaling O(n^), whereas its standard version (commonly used HOSVD algorithm in 
computer science) scales as 0{n‘^), becoming intractable even for moderate sizes of tensors. 

We conclude by notice that the optimized Tucker and canonical tensor approximations can 
be computed by the alternating least square (ALS) iteration with initial guess obtained by 
HOSVD/RHOSVD approximations. 

2.3 Matrix Product States/Tensor Train format 

The product-type representation of dth order tensors, which is called in the physical literature as 
the matrix product states (MRS) decomposition (or more generally, tensor network states mod¬ 
els), was introduced and successfully applied in DMRG quantum computations (721 [Ml El], and, 
independently, in quantum molecular dynamics as the multilayer (ML) MCTDH methods [701153j . 
MRS type representations reduce the storage complexity to 0{dr‘^N), where r is the maximal rank 
parameter. 
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In the recent years the various versions of the MPS-type tensor approximations were discussed 
and further investigated in mathematical literature including the hierarchical dimension splitting 
m, the tensor train (TT) [56l[l3], the quantics-TT (QTT) [33], as well as the hierarchical Tucker 
representation [25] . which belongs to the class of tensor network states model. 

The TT format is the particular form of MPS type factorization in the case of open boundary 
conditions. For a given rank parameter r = (ro, ...,rrf), the rank-r TT format contains all elements 
A = € ]^riix...xnjj^ which can be represented as the contracted products of 3-tensors, that 

in the index notation takes a form, 

ri rd 

ah,-M = ^ ■■■ 

01=1 OjJ = l 

specihed by the set of column vectors, aa],o£+i £ = 1, or equivalently by the vector¬ 

valued ri X matrices, = [aa^a^+i], (i-®-) 3-tensors), cf. ()2.2h . The latter representation is 
written in the matrix product form, explaining the notion MPS, where A^^^ii) is r£_i x ri matrix. 



Figure 2.5: Visualizing 5th-order MPS/TT tensor. 

Figure [23] illustrates the TT representation of a 5th-order tensor, where each particular entry is 
factorized as a product of hve matrices, = A^^'){ii)A^‘^')( 12 ) ■ ■ ■ where, for example, 

A(2)(i2) E 

The rank-structured tensor formats like canonical. Tucker and MPS/TT-type decompositions 
also apply to matrices. For example, the d-dimensional FFT matrix over grid can be imple¬ 
mented on the rank-A: tensor with the linear-logarithmic cost 0{dkN log 2 N), due to the rank-1 
factorized representation 

d- ® /)(/ ® ® /)...(/ ® I... (g) ® ... (g) fI^\ 

where G represents the univariate FFT matrix along mode £. 

2.4 Qualities tensor approximation of functional vectors 

In the case of large mode size, the asymptotic storage cost for a class of function related N-d 
tensors can be reduced to 0(4 log A) by using quantics-TT (QTT) tensor approximation method 
[33]. The QTT-type approximation of an A-vector with A = 2^, L G N, is dehned as the tensor 
decomposition (approximation) in the canonical, TT or more general formats applied to a tensor 
obtained by the dyadic folding (reshaping) of the target vector to an L-dimensional 2 x ... x 2 data 
array (tensor) that is thought as an element of the L-dimensional quantized tensor space. 

In the vector case, i.e. for 4 = 1, a vector x = [xi] G with A = 2^, is reshaped to its 
qualities (quantized) image in by dyadic folding, 

F2,l : X ^ Y = [yj] G j = (ji, ••■Vi), jv G {1,2}, 
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where for fixed i, we have yj := Xi, with ji, = ji/(i) = C^-i {v = 1, L) being dehned via binary 
coding, i.e. the coefficients Ci,-i € {0,1} are found from the binary representation of i — 1, 


L 

i — 1 = Cq + Ci2^ + • • • + Cl-i2^ ^ — 1)2'^ 

v=l 


Next figure visualizes the QTT approximation process. 


N=2 


3 


L=log N=3 





Suppose that the quantics image for an A^-vector, i.e. an element of L-dimensional quantized 
tensor space with L = log 2 N, can be represented (approximated) by the low-rank canon¬ 

ical or TT tensor of order L, thus introducing the QTT approximation of an A^-vector. Given 
rank parameters {k = 1, ...,T), the QTT approximation of an A^-vector requires a number of 
representation parameters estimated by 

2r^ log 2 A^ A^, where < r, A: = 1,...,L, 

providing log-volume complexity scaling in the size of initial vector, N. For d > 1 the construction 
is similar |33j . 

The power of QTT approximation method is explained by the theoretical substantiation of the 
QTT approximation properties discovered in [33] and establishing the perfect rank-r decomposi¬ 
tion for the wide class of function-related tensors obtained by sampling continuous functions over 
uniform (or properly rehned) grid: 

• r = 1 for complex exponents, 

• r = 2 for trigonometric functions and plain-waves. 

• r < m -|- 1 for polynomials of degree m, 

• r is a small constant for wavelet basis functions, Gaussians, etc. 
all independently on the vector size N. 

Notice that the notion quantics (or quantized) tensor approximation (with a shorthand QTT) 
originally introduced in 2009, see [33|, is reminiscent of the entity “quantum of information”, that 
mimics the minimal possible mode size (n = 2) of the quantized image. 

Goncerning the matrix case, it was first found in [55] by numerical tests that in some cases 
the dyadic reshaping of an A^ x A^ matrix with N = 2^ may lead to a small TT-rank of the 
resultant matrix rearranged to the tensor form. The efficient low-rank QTT representation for a 
class of discrete multidimensional elliptic operators (matrices) and their inverse was proven in [32] . 
Moreover, based on the QTT approximation, the important algebraic matrix operations like FFT, 
convolution and wavelet transforms can be implemented by superfast algorithms in 0(log2 N)- 
complexity, see survey paper [36| and references therein. 


9 
























2.5 Rank-structured tensor operations in ID complexity 

The rank-structured tensor representation provides ID complexity of multilinear operations with 
multidimensional tensors. Rank-structured tensor representation provides fast multi-linear algebra 
with linear complexity scaling in the dimension d. 

For given canonical tensors A and B as in (EU), with ranks Ra and Rb, respectively, their 
Euclidean scalar product can be computed by univariate operations 

Ra Rb d 

(A,B) = n (af\bf ) , (2.3) 

i=i j=i i=i 


at the expense 0{dnRaRb)- 

The Hadamard (entrywise) product of tensors A, B is defined by Y = := A © B, 

where For canonical tensors A and B given in form (12.ID . the Hadamard 

product is calculated in 0{dnRaRb) operations by ID entrywise products of vectors, 

Ra Rb 

A 0 B = ^ ^ acj ^ 0 0 ... 0 0 . (2.4) 

i=l j=l 

Summation of two tensors in the canonical format C = A -|- B is performed by a simple concate¬ 
nation of their factor matrices, ..., a^^] and B^^'> = [b^^\ ..., b^^], 

CW = [aP,...,a2,bf\...,b2J] e (2.5) 

The rank of the resulting canonical tensor increases up to Rc = Ra A Rb- 

In electronic structure calculations, the 3D convolution transform with the Newton kernel, 

, is the most computationally expensive operation. The tensor method to compute convolu¬ 
tion over large n x n x n Cartesian grids in 0(n log n) complexity was introduced in [33]. Given 
canonical tensors A, B, their convolution product is represented by the sum of tensor products of 
ID convolutions. 


Ra Rb 

a*b = EE 


CiCd 


i=l j=l 


ap^ * b^-^^ 

* j 




( 2 . 6 ) 


where a), ' *bV denotes the univariate convolution product of n-vectors. The cost of tensor convolu¬ 
tion in both storage and time is estimated by 0{RaRbn log n). The resulting algorithm considerably 
outperforms the conventional 3D FFT-based approaches of complexity 0{v? logn), see numerics in 


The sequences of rank-structured operations on matrices and vectors normally lead to the 
increase of tensor ranks, usually being multiplied or added after each operation. The necessary 
rank reduction in the Tucker and MPS type formats can be implemented by stable algorithms 
based on the higher order SVD. In the physical community the HOSVD-type algorithms are known 
since longer as the Schmidt decomposition [691 (681161j . In the case of canonical tensors the rank 
reduction can be performed by the RHOSVD algorithm based on the canonical-to-Tucker and then 
Tucker-to-canonical transforms described and analyzed in [371 ESI ES] (see ^2.21 and Appendix), 
which demonstrated the stable behavior for most of examples in the Hartree-Fock calculations we 
considered so far. The stability conditions for the RHOSVD have been discussed in [38l 138] . 
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3 Tensor calculus for the Hartree-Fock equation 


3.1 Calculation of multi-dimensional integrals 

Tensor-structured calculation of the multidimensional convolution integral operators with the New¬ 
ton kernel have been introduced in [381 ITTl [39] , where on the examples of the Hartree and exchange 
operators in the Hartree-Fock equation, it was shown that calculation of the 3D and 6D convolu¬ 
tion integrals can be reduced to a combination of ID Hadamard products, ID convolutions and ID 
scalar products. 



Figure 3.1: Left: Glycine amino acid in a computational box. Right: approximation of the 
Gaussian-type basis function by a piecewise constant function. 

The molecule is embedded in a certain fixed computational box D = [—b,b]^ £ as in 
Fig. 13.11 left. For a given discretization parameter n £ N, we use the equidistant n x re x n 
tensor grid i £ T := {l,...,re}^, with the mesh-size h = 2bl{n + 1). Then the 

Gaussian basis functions gk{x), x £ M^, are approximated by sampling their values at the centers 
of discretization intervals, as in Fig. 13.11 right, using one-dimensional piecewise constant basis 
functions gk{x) ^gi.{x) = \^(,=i'g'^^\xi), i = 1,2,3, yielding their rank-1 tensor representation, 

Gfc = ® gf ® gf £ A: = l,...,iVfe. (3.1) 

Let us consider the tensor calculation of the Hartree potential 

Vh{x):= [ 

JR3 \\x - y\\ 

and the corresponding Goulomb matrix, 

Jkm-= gk{x)gm{x)VH{x)dx, A:,m = 1,... iVb x £ 

JR3 

^orb 

where the electron density, p{x) = 2 ^ is represented in terms of molecular orbitals ^a{x) = 

a=\ 

Nb 

Ca,kgk{x)- Given the discrete tensor representation of basis functions (13.11) . the electron density 

k=l 

is approximated using ID Hadamard products of rank-1 tensors (instead of product of Gaussians), 

^orb ^b 

p « 0 = ^ ^ ^ Ca,mCa,k{S^^ © G (gf © g^^^) © (gf © g^^) ^ 

a=l k=l m=l 

Further, the representation of the Newton kernel by a canonical rank-i? 7 v tensor [7j is used 

(see Appendix for details), 

Rn 

Pr = Y1 ® ® P^ e (3.2) 

q=l 
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Since large ranks make tensor operations inefficient, the multigrid canonical-to-Tucker and Tucker- 
to-canonical algorithms should be applied to reduce the initial rank of 0 i— 0 ' by several orders of 
magnitude, from N^/2 to Rp <C N^/2. Then the 3D tensor representation of the Hartree potential 
is calculated by using the 3D tensor product convolution, which is a sum of tensor products of ID 
convolutions, 

Rp -Riv 

Vh = &' *PR='^'^Cm * pW) (g) (g) * p®) . 

m=l q=l 

The Coulomb matrix entries Jkm are obtained by ID scalar products of V// with the Galerkin basis 
consisting of rank -1 tensors, 

Jkm ~ (Grfc © Gjti, V//), k,Tn — 1 , . . . Nfj. 

The cost of 3D tensor product convolution is 0(nlog n) instead of O(n^logn) for the standard 
benchmark 3D convolution using the 3D FFT. Next Table shows CPU times (sec) for the Matlab 
computation of Vh for H 2 O molecule [3H] on a SUN station using 8 Opteron Dual-Core/2600 
processors (times for 3D FFT for n > 1024 are obtained by extrapolation). C2T shows the time for 
the canonical-to-Tucker rank reduction. In a similar way, the algorithm for 3D grid-based tensor- 



1024^ 

2048^ 

4096^ 

8192^ 

16384^ 

FFTg 

~ 6000 

- 

- 

- 

~ 2 years 

C*C 

8.8 

20.0 

61.0 

157.5 

299.2 

C2T 

6.9 

10.9 

20.0 

37.9 

86.0 


Table 3.1: Times (sec) for the 3D tensor product convolution vs. 3D FFT convolution. 


structured calculation of 6 D integrals in the exchange potential operator was introduced in [5U] . 

R^orh 

^km ~ ^km,a with 
a=l 


Kkm,a-= [ [ k,m = l,...Nh, 

Jr 3 \x - y\ 

the contribution from the o-th orbital are approximated by tensor anzats, 


Kk 


m,a • — 


Gfc 0 






5 

G/tt, 0 ^ ^ Ci/aGi/ 

_M=1 


u=l 



Here, the tensor product convolution is first calculated for each ath orbital, and then scalar products 
in canonical format yield the contributions to entries of the exchange Galerkin matrix from the a-th 
orbital. 

These algorithms were employed in the first tensor-structured solver using 3D grid-based evalu¬ 
ation of the Coulomb and exchange matrices in ID complexity at every step of SCF EVP iteration 
[Ml SI]. A sequence of dyadically refined 3D Cartesian grids was used for reducing time in first 
iterations, with an e convergence criterion for switching to larger grids. This is a nonstandard 
computational scheme avoiding calculation of the two-electron integrals. The accuracy for small 
molecules like H 2 O and CH 4 was of the order of 10“'^ Hartree. Though time performance of this 
solver was not compatible with the standard Hartree-Fock packages it was the first proof of concept 
for the tensor numerical methods. 
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3.2 3D grid-based calculation of the two-electron integrals. 

The basic tensor-structured Hartree-Fock solver employs the factorized 3D grid-based calculation 
of the two-electron integrals tensor, B = in the form 


b 


flUK,X — 


9fi{x)g^{x)gf,{y)gx{y) 

Ik - y\\ 


dxdy — (G^ © Gj/, P_r * (G^ 0 Ga ))^®3 j 


(3.3) 


by using univariate tensor operations. Introduce the side matrices representing on the grid the 
full set of canonical vectors composing the products of the Gaussian basis functions, {G^OG,}, 










= 1,2,3, 


where in most of our Hartree-Fock calculations grids of size = 32 • 10^ or = 64 • 10^ have 
been used. It was found that the large matrices of size n x (e.g. = 40000 for Alanine 

amino acid) can be approximated with high accuracy by low rank matrices with the rank parameter 
bounded by < Nb |431144| . The corresponding low-rank factorizations (“ID density fitting”) in 
the form (for £ = 1,2, 3) 


qW ^ c/W e (3.4) 

is computed by the truncated Cholesky decomposition of the symmetric, positive dehnite 
(see [ISl m] for more details). 

Based on factorization (|3.4p . the number of convolutions in (13.31) is reduced dramatically from 
to Nb at most (say from 40000 to 200). In fact, using canonical factors from the rank-i? canonical 
tensor Pjj representing the Newton kernel (13.21) (see (|7.8I) in Appendix) we, first, precompute the 
set of “convolution“ matrices for every space variable £, £ = 1, 2, 3, 

Mf = *n D(^)) e k = 1,...,R, (3.5) 

which includes the convolution products with Ri < Nb column vectors of the matrix G 
instead of N"^/2 convolutions in the initial formulation. 

Given matrices , then the resulting 4-th order TEI tensor is represented in a matrix form 


B = mat{B) ^ B, := G R^i^^K 

k=l 


The above nonstandard factorization of the TEI matrix B allows to reduce dramatically the com¬ 
putational cost of the standard Cholesky factorization schemes [MIE] applied to the reduced-rank 
symmetric positive dehnite matrix B. In fact, the low-rank Cholesky decomposition of B is calcu¬ 
lated in the following sequence. First, the diagonal elements of B are calculated as 


= j;0i,FW(i,:)MfVW(:,f)^. 

k=l 

Then the selected columns of the matrix B, required for the rank-truncated Cholesky factorization 
scheme, are computed by the following fast tensor operations 

k=l 
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leading to representation of the matrix B in the form of rank-ii^ approximate decomposition 

[laiii] 

B := [ 6 ^.,.a] « with L e Rb ~ Nb. 

This algorithm is much faster than the direct Cholesky decomposition of the matrix B with on- 
the-fly computation of the required column vectors. 


re^ 

16384^ 

32768^ 

65536^ 

131072^ 

mesh (bohr) 

0.0024 

0.0012 

1 

0 

r —1 

0 

1 

0 

r —1 

CO 

Had. prod. 

1.6 

3.4 

12 

19 

Density fit. 

0.5 

1.0 

2.0 

4.0 

3D Conv. time 

69 

151 

698 

496 

Ghol. time 

2.2 

2.2 

2.2 

2.2 


Table 3.2: Times (sec) for 3D grid-based calculation of the directional density fitting and the TEI 
for H 2 O molecule. 


Table represents times (sec) for 3D grid-based calculation of the directional density htting 
and the TEI tensor (electron repulsion integrals) for H 2 O molecule in a box [—20,20]^ bohr^, 
performed in Matlab on a 2-Intel Xeon Hexa-Core/2677. Time for convolution integrals in ()3.5p 
scales almost linearly in the ID grid-size n as expected by theory. 


3.3 Core Hamiltonian. 

The Galerkin representation of the 3D Laplace operator in the nonlocal Gaussian basis 
{gk{x)}i<k<NhJ ^ leads to the fully populated matrix Ag = [akm] S Tensor 

calculation of the matrix entries akm for the discrete Laplacian Ag in the separable Gaussian basis 
is reduced to ID matrix operations [IS] involving the EEM Laplacian A 3 , defined on re x n x n 
grid, 

A 3 = aS^^ (g) 0 j(3) + j(l) ^ ^(2) ^ j(3) ^ j(l) ^ j( 2 ) ^ ^(3)^ 

where Ai = ^tridiag{ —1,2, —1}. Specifically, we have 

^km {A^Gk, Gm), 


where is the tensor representation of Gaussian basis functions using the piecewise linear finite 
elements. In the case of large re x re x re grids, this calculation can be implemented with logarithmic 
cost in re by using the low-rank QTT representation of the large matrix A 3 , see [32]. 

For tensor calculation of the nuclear potential operator 


Vc{x) 


M 


E 

a=l 




X — On 


Za > 0, X, tta G 


we apply the rank -1 windowing operator, Wq = (g) (g) yVa \ for shifting the reference 

Newton kernel Pr G ^ 2 nx 2 nx 2 n according to the coordinates of nuclei in a molecule (see Section 5 
and Appendix). Then the resulting nuclear potential, Pc G ig obtained as a direct tensor 

sum of shifted potentials [45| . 

M 

Pc = Yl ZaWoiPR 

OL = l 
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M R 

= E ® >vFpf ^ ® 

ol =1 g=l 

This leads to the following representation of the Galerkin matrix, I 4 = [Tfcm]) by tensor operations 

Vkm= Vc{x)gk{x)g^{x)dx ra {GkQGm,'Pc), l<k,m<Nb. 

Jr^ 

Figure [221 left, shows several vectors of the canonical representation of the Coulomb kernel along 
one of variables. Figure [221 right, represents the cross-section of the resulting nuclear potential Pc 
for C 2 H 5 OH molecule. 



Figure 3.2: Left: Several vectors of the canonical representation of the Newton kernel along one of 
variables. Right: calculated sum of the nuclear potentials for ethanol molecule. 


3.4 Black-box tensor solver. 

The tensor-structured Hartree-Fock solver [35] based on factorized calculation of the two-electron 
integrals | 32 | includes efficient tensor implementation of the MP2 energy correction |44j scheme. 
Though it is yet implemented in Matlab, its performance in time and accuracy is compatible with 
the standard packages based on analytical evaluation of the two-electron integrals. Due to ID 
complexity of all calculations, it enables 3D grids of the size 10^®, yielding mesh size of the order of 
atomic radii, 10““^ A. That ensures high accuracy of calculations, which is controlled by the e-ranks 
of tensor truncation. 

The solver works in a black-box way: input the grid-based basis-functions and coordinates of 
nuclei in a molecule and start the program. Calculation of TEI for H 2 O on grids 32768^ takes 
two minutes on a laptop. The time for TEI with = 131072^ for Alanine amino acid takes 
approximately one hour in Matlab, including incorporated density fitting. 

Next examples compare the results from benchmark Molpro program with calculations by the 
tensor-based solver by using the same Gaussian basis, but now discretized on 3D Cartesian grid. 
Eor all molecules we use the ”cc-pVDZ“ Gaussian basis set. The core Hamiltonian part in these 
calculations is taken from Molpro. Note that since in the tensor solver the density fitting is included 
in ”blind “ calculation of TEI, it is not easy to compare the CPU times of our calculations with 
those in ” ab-initio“ procedure in the standard programs, because the density fitting step there is 
usually considered as off-line pre-computing. 

Eigure (221 shows convergence history of ab initio iterations for H 2 O molecule (cc-pVDZ-41), 
where TEI is computed on the 3D grid of size 131072^, while Figure 12121 fright 1 presents the zoom 
of the left graphic at the last 20 iterations. The ground state energy from Molpro is shown by the 
black dashed line. 
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Figure 3.3; Left: SCF EVP iteration for H 2 O; Right: convergence of the ground state energy vs. 
final 20 iterations for H 2 O. 


Figure [ 33 ] (left) shows the SCF EVP iteration for Glycine amino acid (cc-pVDZ-170), where 
dashed line indicates convergence of the residual and the red solid line shows convergence of the 
error to ground state energy from MOLPRO package with the same basis set. Figure 13.41 (right) 
illustrates the convergence of the ground state energy at final 20 iterations for Glycine, dashed 
line is the energy from Molpro. The second row in Table 13.31 represents times (sec) for one step 




Figure 3.4: Left: SCF EVP iteration for Glycine (C 2 H 5 NO 2 ) molecule. Right: convergence of the 
ground state energy at final 20 iterations for Glycine molecule. 

of SCF iteration in ab-initio solution of the Hartree Fock EVP for H 2 O (cc-pVDZ-41), H 2 O 2 
(cc-pVDZ- 68 ), and C 2 H 5 NO 2 (cc-pVDZ-170) molecules while the hrst row shows the molecular 
parameters, number of orbitals and basis functions. Next two rows show the relative difference 
AEo,g = |Eo,g — Eq\/\Eq\, between the grid-based ground state energy, Eo^g and Eq from Molpro 
with the same basis sets. This numerics demonstrates that in the case of fine enough spacial 
n X n X n-grids the accuracy in 7 - 8 digits (i.e. relative accuracy about 10“^ — 10“®) can be 
achieved for moderate size molecules up to small amino acids. All calculations are performed in 
the computational box of size [—20,20]® bohr®. This tensor-based solver can be considered as the 
computational tool for trying the alternatives to Gaussian-type basis sets. 
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H2O 

H2O2 

C2H5NO2 

Norb, Nb 

5; 41 

9; 68 

20; 170 

Time 

0.35 

0.55 

6.0 

AEo.q, (65536^) 

3.0-10-^ 

8.0-10-« 

9.1 • 10-'' 

AXo.g, (131072^) 

1.4-10-^ 

3.9 • 10-« 

0 

1 

0 

1—1 

0 

06 


Table 3.3; Time (sec) for one ab-initio SCF EVP iteration and accuracy of ab-initio solution with 
respect to grid-size in TEI calculations. Matlab on an Intel Xeon X5650. 

4 Prom MP2 energy correction to excited states 

4.1 MP2 correction scheme by using tensor formats 

Given the set of Hartree-Fock molecular orbitals {Cp} and the corresponding energies {sp}, p = 
1,2, ...,Nh, where {Cj} and {Ca} denote the occupied and virtual orbitals, respectively. First, one 
has to transform the TEI matrix B = corresponding to the initial AO basis set, to those 

represented in the molecular orbital (MO) basis, 


Nb 

F — ■ Viajb — ^ ^ Cp^iCi/aC\jC(jl)bpi/^X(j, (^'f) 

/i,h',A,C7=l 

where a, 6 € i, j S Xq, and Xq := {1,..., Norb]-, Xu := {^orb + 1; ^b}i with N^rb denoting the 
number of occupied orbitals. In the following, we shall use the notation 


Ny = Nb — Norb, Nov = NorbNy. 


The straightforward computation of the matrix V by above representation makes the dominating 
impact to the overall numerical cost of order 0{N^). The method of complexity 0{N^) based on 
the low-rank tensor decomposition of the matrix V was introduced in [H] . Indeed, it can be shown 
that the rank Rb = 0{Nb) approximation to the TEI matrix B ~ LL^, with the N x Rb Cholesky 
factor L, allows to introduce the low-rank representation of the matrix V, (see [S] and M) 

V = LvL^, 


and then reduce the asymptotic complexity of calculations to 0{N^). 

Given the tensor V = [viajb], the second order MP2 perturbation to the HF energy is calculated 
by 

EmP 2 ~ — '^iajb{‘^Viajb ~ Vibja) 2 ^ 

a,b&Ivir i,j&Iocc 

where the real numbers Sk, k = 1,Nb, represent the HF eigenvalues. 

Introducing the so-called doubles amplitude tensor T, 


T — • tiajb — 


{‘^Viajb Pibja 


CL,b E: lyir, k j ^ 


Ea Eb Ei Ej 

the MP2 perturbation takes the form of a simple scalar product of tensors, 


Emp2 = -(V,T) =-(V0T,l), 
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where 1 denotes the rank-1 all-ones tensor. Introducing the low e-rank reciprocal “energy" tensor 




1 

£a £b S'j £j 


CL^b E: Ivirt h J E Iq 


and the partly transposed tensor (transposition in indices a and b) 


allows to decompose the doubles amplitude tensor T as follows 

T = -h t(2) = 2V © E - V' 0 E. 


(4.3) 


(4.4) 


Notice that the denominator in (14.21) remains strongly positive if e^ > 0 for a E Ivir and e* < 
0 for i E locc- The latter condition (nonzero homo lumo gap) allows to prove the low e-rank 
decomposition of the tensor E [431144] . 

Each term in the right-hand side in (|4.4I) can be treated separately by using ranks-structured 
tensor decompositions of V and E, possibly combined with various symmetries and data sparsity. 
Numerical tests illustrating the tensor approach to the MP2 energy correction are presented in |44] . 


4.2 Toward low-rank approximation of the Bethe-Salpeter equation for calcu¬ 
lation of excited states 


One of the commonly used approaches for calculation of the excited states in molecules and solids, 
along with the time-dependent DFT, is based on the solution of the Bethe-Salpeter equation (BSE), 
see for example [111 160] . The BSE approach leads to the challenging computational task on the 
solution of the eigenvalue problem for determining the excitation energies ojn, governed by a large 
fully populated matrix of size 0{N‘^) 0{N^), 



(4.5) 


so that the computation of the entire spectrum is prohibitively expensive. Here the large matrix 
blocks of size Nqv x Noy take a form 


4 = Ae + E-lT, B = V -W, 

where the diagonal ’’energy” matrix is defined by 

Ae = [A£iajb] E : A£iajb = (^a - ei)bij6ab, 

while the matrices W = \wiajb] and W = [wiajb] are determined by permutation of the so-called 
static screened interaction matrix W = [wiajb], via Wiajb = 'Wij,ab, and [wia,jb] = ['Wib,aj], respec¬ 
tively. In turn, the forth order tensor W = [wiajb] is constructed by certain linear transformations 
of the tensor V = [viajb], see [HJ [60] . 

A number of numerical methods for structured eigenvalue problems have been discussed in the 
literature [501111 [niisii. 

The tensor approach to the solution of the partial BSE eigenvalue problem for equation (|4.5I) 
proposed in |6| suggests to compute the reduced basis set by solving the simplihed eigenvalue 
problem via the low-rank plus diagonal approximation to the matrix blocks A and B, and then 
solve spectral problem for the subsequent Galerkin projection of the initial system (14.51) to this 
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reduced basis. This procedure relies entirely on multiplication of the simplified BSE matrix with 
vectors. 

It was demonstrated on the examples of moderate size molecules [ 6 ] that a small reduced basis 
set, obtained by separable approximation with the rank parameters of about several tens, allows 
to reveal several lowest excitation energies and respective excited states with the accuracy about 
O.leV - 0.02eV. 




Figure 4.1: Comparison of mo = 30 lower eigenvalues for the reduced and exact BSE systems for NH 3 
molecule: e = 0 . 6 , left; e = 0 . 1 , right. 

Figure ITT] illustrates the BSE energy spectrum of the NH 3 molecule (based on HF calculations 
with cc-pDVZ-48 GTO basis) for the lowest Nj-ed = 30 eigenvalues vs. the rank truncation pa¬ 
rameter e = 0.6 and 0.1, where the ranks of V and the BSE matrix block W are 4, 5 and 28, 30, 
respectively. For the choice e = 0.6 and e = 0.1, the error in the 1st (lowest) eigenvalue for the 
solution of the problem in reduced basis is about O.lleV and 0.025eV, correspondingly. The CPU 
time in the laptop Matlab implementation of each example is about 5sec. 

5 Tensor approach to simulation of large crystalline clusters 

In this section, we briefly discuss the generalization of the tensor-based Hartree-Fock solver to the 
case of large lattice structured and periodic systems gSlIlT] arising in the numerical modeling of 
crystalline, metallic and polymer type compounds. 

5.1 Fast tensor calculation of a lattice sum of interaction potentials 

One of the challenges in the numerical treatment of large molecular systems is the summation 
of long-range potentials allocated on large 3D lattices. The conventional Ewald summation tech¬ 
niques based on a separate evaluation of contributions from the short- and long-range parts of the 
interaction potential exhibit 0(L^ log L) complexity scaling for a cubic Lx Lx L 3D lattice. 

In the contrary, the main idea of the novel tensor summation method introduced in [461148] 
suggests to benefit from the low-rank tensor decomposition of the generating kernel approximated 
on the fine n x n x n representation grid in the 3D computational box LLl. This allows to com¬ 
pletely decouple the 3D sum into the three independent ID summations, thus reducing drastically 
the numerical expenses. The resultant potential sum, which now requires only 0{n) storage and 
0{nL) computational demands, is represented by a few assembled canonical/Tucker n-vectors of 
complicated shape (see Fignre IHTT]) . where n is the univariate grid size for a cubic 3D lattice. 
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For ease of exposition, we consider the electrostatic potential the nuclear potential of a single 
hydrogen atom, Vc{x) = Define the scaled unit cell, Dq = [0, of size b x b x b and consider 
a sum of interaction potentials in a symmetric computational box 

= B X B X B, with B = h[— — , —], L = 2Lo G N, 

consisting of a union ol Lx Lx L unit cells obtained from Dq by a shift along the lattice vector 
6 k, where k = {ki, k2, ks) G such that /c^ G /C := /C_U/C+ for ^ = 1, 2, 3 with /C_ := { — 1,..., —■§} 
and /C+ := {0,1, ^ - 1}. Hence, we have Ql = UkiMM&tc G 

Recall that 6 = nh, where h > 0 is the fine mesh size that is the same for all spatial variables, 
and n is the number of grid points for each variable. We also define the accompanying domain 
LIl obtained by scaling o^ Ql with the factor of 2, = 212^, and, similar to (j7.8p . introduce the 

respective rank-R reference (master) tensor 

P = ^ pW ® p® ® pj^) G M2nx2nx2n^ (5 

9=1 


approximating in on a 2 n x 2 n x 2 n representation grid with mesh size h. 
Let us consider a sum of single Coulomb potentials on a L x L x L lattice, 

Zi 




ki,k2,k:iG)C 


\x - ai{ki,k 2 ,k 3 )\\ 


X G G 


(5.2) 


The assembled tensor approach applies to the potentials defined on n x n x n 3D Cartesian grid. 
It reduces the sum over a rectangular 3D lattice, Pc^ G 


CL 

R 


^CL = Y ^i^(k)P = Y Y ^(k)(pS^^ ® ® p?^)’ 

keX:® ki,k2,k3GK q=l 


to the summation of directional vectors for the canonical decomposition of shifted single Newton 
kernels [46| . 

p-r = E( E ® (E ® (E >V(fc 3 )pf), ( 5 . 3 ) 

9=1 kiGlC k2eK ks&K 

where = Wki ® ® VVfcg is the shift-and-windowing (onto Ql) separable transform along 

the k-grid. Remarkably that the rank of the resulting sum is the same as for the R-term canonical 
reference tensor (15.11) representing the single Newton kernel. 




Figure 5.1: Assembled x- and y-axis canonical vectors for a cluster of 32 x 16 x 8 Hydrogen nuclei. 
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The numerical cost and storage size are bounded by 0{RLn) and 0{Rn), respectively, where 
n = noL, and no is the grid size in the unit cell. 

Figures [5T] and [52] represent the shape of assembled canonical vectors for the cluster of 32 x 16x8 
Hydrogen nuclei. Size of the computational box is 62 x 32 x 22 bohr^. Here the empty interval 
between the lattice and the boundary of the computational box equals to 6 bohr. 



Figure 5.2: Assembled z-axis canonical vectors for a cluster of 32 x 16 x 8 Hydrogen nuclei. As¬ 
sembled canonical sum of the Newton potentials for this cluster. 

The next table represents CPU times for the lattice summation of the Newton kernels over 
L X L X L cubic box, with very fine nx nx n representation grid. 



4096 

32768 

262144 

2097152 

Time (sec.) 

1.8 

0.8 

3.1 

15.8 

3D grid size, 

5632^ 

9728^ 

17920^ 

34304^ 



Figure 5.3: Assembled canonical sum of the Coulomb potentials on the L-shaped (left) and O- 
shaped (right) sub-lattices of a 32 x 32 x 1 lattice. 

The summation method in the canonical format was extended to the Tucker tensors which 
allows the principal generalization of this techniques to the case of rather complicated lattices with 
defects (Theorem 3.2, [M]), so that the resulting sum takes a form 

ri r2 rs 

Tcl = femi,m2,m3( ® ® 

mi=l m2=l m3=l kiGlC k2&K ki&K. 

~(£) 

where tme, = 1; 2, 3, represents the Tucker vectors of the rank-r master tensor approximating the 
Newton kernel in Ql on a 2n x 2n x 2n representation grid. 

In the case of defected lattices the increasing rank of the hnal sum of Tucker tensors can be 
reduced by the stable ALS based algorithms applicable to the Tucker tensors (see [IT]). The 
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particular examples of the lattice geometries suitable for our approach are presented in Figure E31 
see m for more detailed discussion. 

For the reduced Hartree-Fock equation, where the Fock operator is confined to the core Hamil¬ 
tonian, the tensor-structured block-circulant representation of the Fock matrix was introduced |47j 
that allows the special low-rank approximation of the matrix blocks. This opens the way for the 
numerical treatment of large eigenvalue problems with structured matrices arising in the solution 
of the Hartree-Fock equation for large crystalline systems with defects. 

5.2 Interaction energy of long-range potentials on finite lattices 

Given the nuclear charges {^k}; centered at points Xk, k £ /C^, located on a L x L x L lattice 
Cl = {iCk} with the step-size b, the interaction energy of the total electrostatic potential of these 
charges is defined by the lattice sum 

1 Z Z’ 

El = - 71 — II , i.e. for llxj — Xkll > b. (5.4) 

2 . - a^k 

kje^.k^tj ■> 

The fast and accurate computation of electrostatic interaction energy (as well as the related forces 
and stresses) is one of the difficult tasks in computer modeling of macromolecular structures like 
finite crystal, and biological systems. 

The tensor summation scheme (j5..Sp can be directly applied to this computational problem. In 
what follows we show that (j5.4p can be treated as a particular case of the previous scheme served 
for calculation of (15.21) on a fine spacial grid. For this discussion, we assume that all charges are 
equal, Z\^ = Z. 

First, notice that the rank-i? reference tensor /i“^P defined in (15.11) approximates with high 
accuracy 0{h?) the (and its shifted version) Coulomb potential in 17^ (for ||a:|| > b that is 
required for the energy expression) on the fine 2n x 2n x 2n representation grid with mesh size h. 
Likewise, the tensor approximates the potential sum 14^ {x) on the same fine representation 

grid including the lattice points Xk- 

We propose to evaluate the energy expression (15.4p by using tensor sums as in (15.3p . but now 
applied to a small sub-tensor of the rank-i? canonical reference tensor P, that is P^ := [Pix^] ^ 
]^ 2 Lx 2 Lx 2 L^ obtained by tracing of P at the accompanying lattice of the double size 2L x 2L x 2L, 
i.e. Cl = {a^k} U {xk'} £ Here Pja-j^ denotes the tensor entry corresponding to the k-th lattice 
point designating the atomic center Xk. We are interested in the computation of the rank-i? tensor 
Pc^ = [Pci| 3 ;^]keA: S where ^cl\x^ denotes the tensor entry corresponding to the k-th 

lattice point on Cl- The tensor Pc^ can be computed at the expense 0{Lp‘) by 

= E( E *r(»,)pi‘’ ® E ® E »r(t.)pgl,)- 

q=l ki&K k2&K. k3£K. 

This leads to the representation of the energy sum (15.4p with accuracy 0{h?‘) in a form 

kex: 

where the first term in brackets represents the full canonical tensor lattice sum restricted to the 
k-grid composing the lattice Cl, while the second term introduces the correction at singular points 
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Xj — x\f^ = 0. Here 1 G jg all-ones tensor. By using the rank-1 tensor Pql = P| 3 ;^=ol) 

the correction term can be represented by a simple tensor operation 

P|xk=o = (Pol, !)• 

kex: 

Finally, the interaction energy El allows the representation 

^ 2^-3 ^ 

^L = ^—((Pc„l)-(P0L,1)), (5.5) 

that can be implemented in 0{L‘^) <C L^logL complexity by tensor operations with the rank-i? 
canonical tensors in 

Table 15.11 illustrates the performance of the algorithm described above. We compare the exact 
value computed by (|5.4p with the approximate tensor representation in (15.5p computed on the fine 
representation grid with n = tiqL, no = 128. We consider the lattices consisting of Hydrogen atoms 
with interatomic distance 2 bohr. The size of the largest 3D lattice with 256^ potentials is of the 
size 256 -2-1-6 bohiH, which is more than 20 nanometers. 



Total 

T. 

Ttens. 

El 

err. 

24 ^ 

13824 

37 

1.2 

to 

O 

r—1 

17- 

CO 

2 • 10 -« 

32 ^ 

32768 

250 

1.5 

1.5 • 10 ^ 

1.5 • 10 -y 

48 ^ 

110592 

3374 

2.8 

1.12 • 10 ^ 

0 

64 ^ 

262144 

- 

5.7 

5.0 • 10 « 

- 

128 ^ 

2097152 

- 

13.5 

1.6 • 10 ^^ 

- 

256 ^ 

16777216 

- 

68.2 

5.2 • 10 ^^ 

- 


Table 5.1: Comparison of times for the standard (T^) and tensor-based (Ttens.) calculation of the 
interaction energy for the lattice electrostatic potentials. Matlab on an Intel Xeon X5650. 


The presented approach for fast calculation of the interaction energy can extended to the case of 
non-uniform rectangular lattices and, under certain assumptions, to the case of non-equal nuclear 
charges Moreover, it applies to many other types of spherically symmetric interaction poten¬ 
tials, for example, to shielded Coulomb interaction or van der Waals attraction sums corresponding 
to the distance function ||x||“^ and ||x||~®, respectively. 

6 Conclusions 

The goal of this paper is to attract interest of the specialists in computational quantum chemistry 
to recent results and open questions of the grid-based tensor approach in electronic structure 
calculations. Here we focus mostly on the description of main mathematical and algorithmic aspects 
of the tensor decomposition schemes and demonstrate their benefits in some applications. 

The scope of applications which can be regarded as consistent, ranges from the Hartree-Fock 
energy for moderate size molecules, including “blind” calculation of TEI tensor with incorporated 
algebraic density fitting, to calculation of the excited states for molecules, and up to a unique 
superfast method for calculating the lattice potential sums and the interaction energy of long range 
potentials on a lattice in a finite volume. Tensor approach allows to treat above problems using 

^Here 6 bohr is the chosen dummy distance (space between the computational box boundary and the lattice). 
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moderate computational facilities. All numerics given in the paper presents implementations in 
Matlab. 

The described numerical tools are not restricted to the applications presented here, but can 
be applied to various hard computational problems in (post) Hartree-Fock calculations related to 
accurate evaluation of multidimensional integrals, and efficient storage and manipulations with 
large multivariate data arrays. 

The presented method for summation of long-range potentials with sub-linear computational 
cost does not have analogues in what is used so far in computational quantum chemistry and can 
have a good future. For example, it can be useful in modeling of large finite molecular structures 
like nanostructures or quantum dots, where the periodic approach may be inconsistent. Calculating 
a sum of several millions of lattice potentials on fine 3D grids takes only several seconds in Matlab 
on a laptop [IB]. This method gives also the unique possibility to present the summed 3D lattice 
potential in the whole computational region with very high accuracy. Integration and differentiation 
of this 3D potential can be easily performed on representation grid due to ID computational costs. 

Tensor approach is now being evolved for modeling the electronic structure of finite crystalline- 
type molecular systems m- We hope that tensor numerical methods will have a good future in 
solving challenging multidimensional problems of computational quantum chemistry. 

7 Appendix 

1. Canonical-to-Tucker transform. 

The Canonical-to-Tucker tensor transform combined with the Tucker-to-Canonical scheme in¬ 
troduced in |38| usually applies for the rank reduction of the function related canonical tensors with 
the large initial rank. Here we sketch Algorithm Canonical-to-Tucker which includes the following 
basic steps: 

Input data: Side matrices G I = 1,2,3, composed of vectors G 

, k = 1,... R, see maximal Tucker-rank parameter r; maximal number of the alternating 

least square (ALS) iterations rumax (usually a small number). 

(A) Compute the singular value decomposition (SVD) of side matrices: 

^ = 1^2,3. 

Discard the singular vectors in and the respective singular values up to given rank threshold, 
yielding the small orthogonal matrices G I = 1,2,3. 

(B) Project side matrices onto the orthogonal basis set defined by 

[/W uW = (zW)^[/(^), G i= 1,2,3. (7.1) 

(C) (Find dominating subspaces). Implement the following ALS iteration nimax times at most. 
For I = 1, 2, 3 implement the following ALS iteration rUmax times at most. 

(D) Start ALS iteration for .^ = 1,2, 3: 

> For i = 1 : construct partially projected image of the full tensor, 

U ^ Ui = ® G® (g) CfeGM. (7.2) 

Here G are in physical space for mode 1 = 1, while G[.^^ G ^nd G® G the column 
vectors of 17^^^ and , respectively, belong to the coefficients space by means of projection. 
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[> Reshape the tensor Ui € matrix Mjj-^ S ]R^ix(^ 2 r 3 )^ representing the span 

of the optimized subset of mode-1 columns in partially projected tensor Ui. Compute the SVD of 
the matrix Mu^: 

Mu, = 

and truncate the set of singular vectors in i—)• E according to the restriction on 

the mode-1 Tucker rank, ri. 

[> Update the current approximation to the mode-1 dominating subspace Z^ i—)• Z^^\ 

[> Implement the single loop of ALS iteration for mode i = 2 and for i = 3. 

[> End of the single ALS iteration step. 

[> Repeat the complete ALS iteration rrimax times to obtain the optimized Tucker orthogonal 
side matrices Z^^\ Z^‘^\ Z^^\ and hnal projected image U 3 . 

(E) Project the hnal iterated tensor U 3 in ()7.2p using the resultant basis set in Z^^'^ to obtain 
the core tensor, (3 € ]^^ixr- 2 xr 3 _ 

Output data: The Tucker core tensor (3 and the Tucker orthogonal side matrices Z^^\ i = 1,2, 3. 

The Canonical-to-Tucker algorithm can be easily modihed to the e-truncation stopping criteria. 
Notice that in the case of equal Tucker ranks, = r, maximal canonical rank of the core tensor (3 
does not exceed r^, see [38], which completes the Tucker-to-Canonical part of the total algorithm. 
(Further optimization of the canonical rank in the small-size core tensor /3 can be implemented by 
applying the ALS iterative scheme in the canonical format, see e.g. |49|.i 
2. The Hartree-Fock equation in AO basis set. 

The 2A-electrons Hartree-Fock equation for pairwise L^-orthogonal electronic orbitals, 'ip, : 
^ M, S Lf^(]R^), reads as 

F'ipi{x) = \i'ipi{x), / ipiipjdx = 6ij, i,j = 1, ...,Norb (7.3) 

where the nonlinear Fock operator F is given by 

F:=-^A + V,{-) + Vh{-)+IC. 

Here the nuclear potential takes the form Vc{x) = — ^2^=1 \\x-a || ’ > 0, 0 ^/ E while the 

Hartree potential Vh{x) and the nonlocal exchange operator K, read as 

Vh{x) := p-k= j ,, ,, dy, x E M^, (7.4) 

II • II jr 3 Ik - 2/11 

and ^ 

(icip) (x) 2 ki * 11 ^) kk) = ^3 

respectively. Conventionally, we use the definitions 


^orb 

T{x,y) := 2 y] ilJi{x)'ipi{y), p{x) := t{x,x), 
i=l 

for the density matrix T{x,y), and electron density p{x). 

Usually, the Hartree-Fock equation is approximated by the standard Galerkin projection of the 
initial problem (17.311 by using the physically justified reduced basis sets (say, GTO type orbitals). 
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For a given finite Galerkin basis set {g^}i<i_i<Nbi 9^l S the occupied molecular orbitals ijji 

Nb 

are represented (approximately) as tpi = Y1 ^ ~ To derive an equation for the 

11=1 

unknown coefficients matrix C = {C^i} G first, we introduce the mass (overlap) matrix 

S = {S^u}i<ii,u<Nb, given by S^i, = g^gi^dx, and the stiffness matrix H = {hfj,u} of the core 
Hamiltonian % = — + 14 (the single-electron integrals), 

h^v = \ [ Vg^ ■ Vg^dx -h / Vc{x)g^gudx, I < < Nb- 

^ JR3 J]R3 

The core Hamiltonian matrix H can be precomputed in 0{N^) operations via grid-based approach. 

Given the finite basis set {5/x}i</i<Ari,, gfi G the associated fourth order two-electron 

integrals (TEI) tensor, B = [b^uXa], is defined entrywise by (jd.3h . where g, i^. A, a G {1,..., Ab} =: Ib. 
In computational quantum chemistry the nonlinear terms representing the Galerkin approximation 
to the Hartree and exchange operators are calculated traditionally by using the low-rank Gholesky 
decomposition of a matrix associated with the TEI tensor B = defined in (13.31) . that initially 

has the computational and storage complexity of order 0{N^). 

Introducing the Nb x Nb matrices J{D) and K(D), 

Nb ^ Nb 

'^{^)fiii — ^ ^ Ki^D ') ^ ^ b^x,vKDKX) 

Av,A=1 k.,X=1 

where D = 2CC"^ G is the rank-Aoj.;, symmetric density matrix, one then represents the 

complete Eock matrix F by 

F{D) = H + J{D)+K{D). 

The resultant Galerkin system of nonlinear equations for the coefficients matrix C G and 

the respective eigenvalues A = diag{Xi ,..., Aw;,), reads as 

F{D)C = SCA, C^SC = In, 

where the second equation represents the orthogonality constraints tpiipjdx = Sij, and In 
denotes the Nb x Nb identity matrix. 

3. Tensor approximation to the Newton kernel in 3D. 

Methods of separable approximation to multivariate spherically symmetric functions by using 
the Gaussian sums have been addressed in the chemical and mathematical literature since [U] and 
uniEi], respectively. 

In this section, we discuss for the readers convenience the grid-based method for the low-rank 
canonical and Tucker tensor representations of a spherically symmetric functions p(||x||), x G in 
the particular case of the 3D Newton kernel p(||x||) = n^, x G by its projection onto the set of 
piecewise constant basis functions, see [7] for more details. 

In the computational domain D = [—6/2, 6/2]^, let us introduce the uniform nxnxn rectangular 
Cartesian grid with the mesh size h = b/n (usually, n = 2k). Let {V’i} be a set of tensor-product 
piecewise constant basis functions, '4i(x) = n£=i tbe 3-tuple index i = ( 11 , 12 , 13 ), 

4 G {1, ...,n}, £ = 1, 2, 3. The kernel p(||x||) can be discretized by its projection onto the basis set 
{ipi} in the form of a third order tensor of size nxnxn, defined entrywise as 

P := [pi] G Pi = / Mx)p{M) dx. (7.6) 

Jr3 
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Given M, the low-rank canonical decomposition of the 3rd order tensor P is based on using 
exponentially convergent sine-quadratures for approximation of the Laplace-Gauss transform to 
the analytic function p{z) = Ijz as follows, 

„ „ M M3 

l/||x|| = ^/ (7.7) 

k=-M k=-M e=l 

where the quadrature points and weights are given by 

2 

tk = ^^M, flfc = —;=flM, flM = Co log(M)/M, Co > 0. 


Under the assumption 0 < a < ||x|| < oo, x = (xi,X 2 ,X 3 ) G this quadrature can be proven 
to provide the exponential convergence rate in M for a class of analytic functions, see [MlliniEl], 


l/\\x 


M 

E 

k=-M 


dk^ 




C /3Vm 


< — e 
a 


with some C, /? > 0. 


Combining (|7.6p and (j7.7l) . and taking into account the separability of the Gaussian basis functions, 
we arrive at the low-rank approximation to each entry of the tensor P, 


Pi ~ ^ Ofc / V’i(x)e ‘^ll^ll^dx= ^ / iljf^{xi)e 

k=-M k=-M e=i 

Define the vector (recall that > 0) 

L = l Jr 

then the 3rd order tensor P can be approximated by the i7-term canonical representation 


M3 R 

p^Pr= “fc (8) itk) = Y, ® pf ® pf e 

k=—M i=\ q=l 


(7.8) 


it) it 

where R = 2M -|- 1, and the canonical vectors are renumbered byA:—= P 5 =p),'^ G 

M"", i = 1,2,3. For the given threshold e > 0, M = 0{\ logep) is chosen as the minimal number, 
such that in the max-norm we have 


||P-Pfi|| <e||P||. 

The symmetric canonical tensor P/j in (|7.8p approximates the discretized 3D symmetric kernel 
function p(||x||) = l/||x|| {x G D), centered at the origin, such that pg^^ = pg^^ = pg^^ {q = 1, 
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